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Abstract 

We consider two nonparametric estimators for the risk measure of the sum of n 
i.i.d. individual insurance risks where the number of historical single claims that 
are used for the statistical estimation is of order n. This framework matches the 
situation that nonlife insurance companies are faced with within in the scope of 
premium calculation. Indeed, the risk measure of the aggregate risk divided by 
n can be seen as a suitable premium for each of the individual risks. For both 
estimators divided by n we derive a sort of Marcinkiewicz-Zygmund strong law as 
well as a weak limit theorem. The behavior of the estimators for small to moderate 
n is studied by means of Monte-Carlo simulations. 
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1. Introduction 


Let {Xi) be a sequence of nonnegative i.i.d. random variables on a common probability 
space with distribution jj,. In the context of actuarial theory, the random variable S'„ : = 
can be seen as the total claim of a homogeneous insurance collective consisting 
of n risks. The distribution of Sn is given by the n-fold convolution /i*” of fi. A central 
task in insurance practice is the specihcation of the premium for the aggregate 

risk Sn, where TZp is the statistical functional associated with any suitable law-invariant 
risk measure p (henceforth referred to as risk functional associated with p). Note that 
^7Zp{p*‘^) can be seen as a suitable premium for each of the individual risks Xi,, Xn, 
where it is important to note that ^7lp{p*^) is typically essentially smaller than lZp{p). 

On the one hand, much is known about the statistical estimation of the single claim 
distribution p and about the numerical approximation of the convolution with known 
p. On the other hand, an analysis that combines both statistical aspects and the nu¬ 
merical approximation of p*'^ seems to be rare. In [10], this question was approached 
through an estimation of p*'^ by the normal distribution Mnfh,,„,nsi with estimated pa¬ 
rameters based on a sample of size Un G N. Here rhun refer to respectively 

the empirical mean and the empirical variance of a sequence of i.i.d. random variables 
with distribution p having a hnite second moment. It was shown in [10] that for many 
law-invariant coherent risk measures p and any sequence (m„) of positive integers for 
which Un/n converges to some constant c G (0, cxd ) we have 
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for every r < 1/2, and 


law 1 .^1/2 '^p(h- ) 


n 




0 , *2 ) 


n —)■ cxo 
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with := Var[Xi]. Of course, (|2]) implies in particular that the convergence in (IT]) 
cannot hold for r > 1/2. The assumption that Un increases to inhnity at the same speed 
as n increases to inhnity is motivated by the fact that the parameters are typically es¬ 
timated on the basis of the historical claims of the same collective from the last year 
or from the last few years. This is also why the presented theory is nonstandard. In 
the existing literature on the statistical estimation of convolutions the number of sum¬ 
mands is typically hxed or increases essentially slower to inhnity than Un does; see, for 
instance, [18] for the nonparametric estimation of a (compound) convolution where the 
(distribution of the) number of summands is hxed and known. It was also shown in [TU] 
that for the exact mean m and the exact variance of p, and for many law-invariant 
coherent risk measures p, 


SWp \Tlp{N'nm,ns'^) -'Tlp{p*'^)\ < OO. (3) 

nEN 
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Both ([I])-© and the simulation study in [TOj show that the overwhelming part of the 
error in the estimated normal approximation of the risk functional is due to the esti¬ 
mation of the unknown parameters rather than to the numerical approximation itself. 
Whereas in the case of known parameters the relative error converges to zero at rate 
(nearly) 1, in the case of estimated parameters the relative error converges to zero only 
at rate (nearly) 1/2. So it is very important to note that statistical aspects may not be 
neglected when investigating approximations of premiums for aggregate risks. 

The estimated normal approximation 'JZp{N'nfhu„,n^ ) of 7?.p(/i*"') is very simple and 
saves computing time in great measure. Indeed, we have 




( 4 ) 


whenever TZp corresponds to a cash additive and positively homogeneous risk measure 
p. On the other hand, in real applications the total claim distribution p*"' is typically 
skewed to the right, whereas the normal distribution is symmetric; see also Figure [TJ 
So it is natural to study methods which better £t skewed total claim distributions. In 
this article, we will therefore replace N'nfhu„,nsi by the n-fold convolution p// of the 
empirical estimator of p. The corresponding estimator Tlp{jl*^) will be referred 
to as empirical plug-in estimator. The calculation of the empirical plug-in estimator 
will be more computing time consuming than the calculation of the estimated normal 
approximation, nevertheless the needed computing time is still satisfying for actuarial 
applications. It is quite clear, and can also be seen from Figure [H that p*^ gets increas¬ 
ingly skewed as the tail of p gets heavier. So it is not surprising that the estimated 
normal approximation works well for light-tailed p and gets worse for medium-tailed 
and heavy-tailed p. A simulation study for the Value at Risk functional in Section H] 
indicates that the empirical plug-in estimator is only slightly better than the estimated 
normal approximation for light-tailed p but is essentially better for medium-tailed p. 
For heavy-tailed p both estimators work well only for rather large n. Throughout this 
article we will use the terms “light-tailed”, “medium-tailed” and “heavy-tailed” in a 
quite sloppy way. By dehnition “heavy-tailed” refers to distributions without a hnite 
second moment. However our theory is only applicable to distributions with a hnite A- 
moment for some A > 2. In this context we refer to heavy-tailed distributions whenever 
A is close to 2 and will use the terms “medium- tailed” and “light-tailed” for larger A. 

To introduce the empirical plug-in estimator rigorously, let (V) be a sequence of i.i.d. 
random variables on some probability space (H, P) with distribution p. The random 
variables 1/ can be seen as observed historical single claims. The empirical probability 
measure of the hrst m G M observations. 


f^U ■ — ^ ^ ^Yi y 
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is the standard nonparametric estimator for /i, and therefore 




:= (aJ*” 
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provides a reasonable estimator for /i*”. Then it is natural to use the plug-in estimator 




(7) 


for the estimation of 7?.p(p*”); computational aspects will be discussed in the Appendix 
El We will see in Section |2] that for a very large class of law-invariant risk measures p, 
any distribution /i with a hnite A-moment for some A > 2, and any sequence (m„) of 
positive integers for which Un/n converges to some constant c G (0,oo), we also have 
(dD-® with Mnfhu„,nsi^ replaced by p*”. In Theorems 12.21 and 12.31 we will prove even 
more, namely 

{fhu^ -m) + op-a.s.(n"^/^), (8) 

{fhu^ -m) + op-a.s.(n"^/^), (9) 


n 

n 


where op-a.s.(’^~^'^^) refers to any sequence of random variables on (f 2 ,W, P) for 
which y/n^n converges P-a.s. to zero as n —)■ cxo. Assertions dH])^® have an astonishing 
consequence. No matter what the particular risk measure p looks like, the asymptotics 
of the estimators ^'R,p{N'nfh,,^,nsi^) and for the individual premium ^7Zp{p*'^) 

are exactly the same as for the empirical mean regarded as an estimator for the mean. 
By the classical Central Limit Theorem, we can derive from ® and ® the following 
asymptotic conhdence intervals at level 1 — a for the individual premium ^TZp(p*^): 
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where <ho,i denotes the distribution function of Mop- 

Further, it is a simple consequence of part (ii) of Theorem 12.21 below that 
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with 7 := min{A — 2; l}/2 and /3 > 0 depending on p. The identity ffTOj) shows that 
for large n (and (3 and 7 away from 0) the individual premium ^7lp(p*‘^) can be seen 
as an approximation of the premium which is determined according to the standard 
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deviation principle with safety loading For the corresponding estimators 

we will obtain (cf. Remark 12.51 below 1 the following empirical analogues of ffTOjl : 

- , -- 
'^Un "h ^ Su„ , 

vn 

fhu^ + ( 11 ) 

Vn 

where refers to any sequence of random variables (.^„) on (hi, P) for 

which the sequence is bounded P-a.s. To some extent, m and flTT]) justify 

the use of the standard deviation principle (with m and s estimated by fhu^ and s'u^, 
respectively), which many insurance companies use to determine individual premiums 
in large collectives. In practice the specihc choice of the safety loading in the context 
of the standard deviation principle is often somewhat arbitrary. Formulae fIlOp and fllip 
now give a deeper insight into the practical choice of the safety loading. It should be 
chosen as the product of a suitable risk functional (which one has actually in mind) 
evaluated at the standard normal distribution and the factor 1 /^/n (where n is the size 
of the collective). The factor l/\/n reflects the balancing of risks in (large) collectives. 

It is quite clear that the goodness of the estimator in ([7]) can be improved through 
replacing the nonparametric estimator 'flu in ([5])-([6]) by a suitable estimator that is based 
on a parametric statistical model. However, this requires preliminary considerations 
w.r.t. a proper choice of the parametric model. Such considerations are feasible and 
common. Nevertheless we leave the parametric approach for future work. 

The rest of the article is organized as follows. In Section H] we will present our results, 
and in Sections [3] and 0] these results will be illustrated by means of examples. The 
results of Section [2] will be proven in Section |5l A remark on the computation of the 
empirical plug-in estimator can be found in the Appendix lAl 
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2 . Results 

Let denote the usual set of all hnitely-valued random variables on an atomless prob¬ 
ability space modulo the equivalence relation of almost sure identity. Let X G he 
a vector space containing the constants. An intrinsic example for X is the space U’ 
(consisting of all p-fold integrable random variables from LP) for p > 1. We will say that 
a map p : A ^ M is 

• monotone if p{Xi) < p{X 2 ) for all Ai, A 2 G X with Xi < X 2 . 

• cash additive if p{X + m) = p{X) + m for all X G A and m G M. 

• subadditive if p{Xi -|- X 2 ) < p{Xi) + p{X 2 ) for all Ai, X 2 G A. 
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• positively homogenous if p(AX) = Xp{X) for all X G X and A > 0. 

As usual, we will say that p is coherent if it satisfies all of these four conditions, and 
that p is law-invariant if p{X) = p{Y) whenever X and Y have the same law. We will 
restrict ourselves to law-invariant maps p : A —)■ M. So we may and do associate with p 
a statistical functional TZp : Wl(A’) —)■ M via 

7^,(/i) ;= p(X^), peM{X), (12) 

where Ai{X) denotes the set of the distributions of the elements of X, and X^ G X has 
distribution p. 

Let M-i be the set of all probability measures on (M, i3(M)), and denote by the 
distribution function of p G Adi. For every A > 0, let the function 0 a : M —)■ [1, cx)) be 
dehned by (l)\{x) := (1 -|- |a:|'^), a; G M. For pi,p 2 ^ Adi, we say that 

d<p^{puP 2 ) ■= sup \Fp^{x) - Fp^{x)\(j)x{x) (13) 

is the nonuniform Kolmogorov distance of pi and p 2 w.r.t. the weight function 0 a- 
It is easily seen that provides a metric on the set Ad^ of all p G Adi satisfying 
< oo. 

Recall that (F)) is a sequence of i.i.d. real-valued random variables on some probability 
space (fl, F, P) with distribution p having a hnite second moment, and that the estima¬ 
tors pu, P*”, and TZpip*^) are given by ([5]), (jb]), and ([7]), respectively. We set m := 
and s := Var[Yi]^/2^ and let := XlLi and := {\YH=i0di “ be the 

corresponding standard nonparametric estimators. The following Assumption 12.11 will 
be illustrated in Section [3l 

Assumption 2.1 Let p : A —)■ R 6e a law-invariant map, and Tip be the corresponding 
statistical functional introduced in TT^) . Let {un) he a seguence in K, and assume that 
the following assertions hold for some X> 2: 

(a) p G Ad(L^), that is, E[|Fi|^] < oo. 

(b) Un/n converges to some constant c G (0, cxd). 

(c) p is cash additive and positively homogeneous, and Ad^ C Ad (A). 

(d) For each seguence (m„) C Ad^ with (i0;,(m„, A/op) —t 0, there exist constants C, (3 > 

0 such that |7^p(m„) — 7?.p(A/op)| < A/op)^ for all n G N. 

The following result is basically already known from [10]. Assertions (iv)-(v) in The¬ 
orem [221 describe the asymptotic behavior of the estimator ^7?.p(A/'nm„„,nS2^) for the in¬ 
dividual premium Note that Tlp{Afnfhu„,n^ ) is always (A”, S(R))-measurable 

due to the representation (jT]). 
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Theorem 2.2 (Estimated normal approximation) Suppose that Assumption \2.1\ 
holds with A > 2 and (3 > 0, and let 7 := min{A — 2; l}/2. Then the following assertions 
hold: 

(l') d^p^Afnm^ns^')') S' Op_a.s.(^ ^ )■ 

{%%) 

(ill) 7(7^p(7V;ft„^,„s2J - 7^p(/i*")) = (m„„ - m) + op_a.s.(^"^/^). 

(iv) r^''i(7^p(A/■nrn„„.r^32^) “ 7^p(/i*'^)) —^ 0 F-a.s. for every r < 1/2. 

(v) law{v^i( 7 ^p( 7 V;^f,^^,„s 2 J -7^p(/i*’^))} 

The following result provides the analogue of Theorem 12.21 for the empirical plug¬ 
in estimator ^TZpijlfff) for the individual premium ^lZp{p*'^). Assertions (iii)-(iv) in 
Theorem 12.31 describe the asymptotic behavior of the estimator ^TZpfJlfff). 

Theorem 2.3 (Empirical plug-in estimator) Suppose that Assumption \2.1\ holds 
with A > 2 and (3 > 0, let 7 ;= min{A — 2; l}/2, and assume that IZpiffiff) is (J^, i3(M))- 
measurable for every n E N. Then the following assertions hold: 

M i(K0^ne,.^.n7lJ - 

(a) = (m„. - m) + 

(Hi) n^^(7Zp(juff)) — Tlplfi*"-)) —)■ 0 P-a.s. for every r < 1/2. 

(iv) law{V^i(7^p(p*/)) -7^p(p*’*))} ^AAo,, 2 . 

Note that the measurability assumption on TZpififf) in Theorem l2.3l is not very restric¬ 
tive. For instance, when p is the Value at Risk or a distortion risk measure (see Sections 
13.1113.21 for details), then it can be easily seen that TZpfpff) is (J^, i3(M))-measurable. 
Moreover, the measurability also holds for any law-invariant coherent risk measure p 
which is dehned on U’ for some p G [ 1 , cxd): 

Remark 2.4 Let X = U’ for some p G [1,cxd). Then for every law-invariant coherent 
risk measure p : —)■ M the estimator 7lp{pff) is {IF, i3(M))-measurable for every n E N. 
See Section 15.31 for a verihcations. O 

Remark 2.5 Note that the considered estimators for the individual premium have the 
following representations: 

—T^p{Afnmu„,ris^ ) — ^Un 3 -1= Su^TZp^Afo^i), (14) 

Tl y/Tl 

-Upip::) = m„^ + ^s„^7^,(A4),l) + Op.a.s.(r^-'/2-"^). (15) 

Tl \/Tl 
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Equation is a simple consequence of part (c) of Assumption I2.11 and ffTHD follows 
from (IT^ and part (i) of Theorem 12.31 O 


3. Illustration of Assumption 12.11 

3.1. Value at Risk 

The Value at Risk at level a G (0,1) is the map VaRo : —)■ M dehned by 

VaRa(A) := Fx{a) := inf{x G M : Fx{x) > a}. 

It is clearly law-invariant and easily seen to be monotone, cash additive, and positively 
homogeneous. Moreover, J\Aj C Ai{L^) trivially holds for every A > 0. In particular, it 
satishes condition (c) of Assumption I2.11 It follows from Theorem 2 in [20] that VaRo, 
also satishes condition (d) of Assumption 12.11 for (3 = 1 and every A > 0. 

3.2. Coherent distortion risk measure 

Let g : [0,1] —)■ [0,1] be a convex distortion function, i.e. a convex and nondecreasing 
function with 5f(0) = 0 and g{l) = 1. Note that the function g is continuous on [0,1) 
and might jump at 1. The distortion risk measure associated with g is dehned by 

/ O roo 

g{Fx{x))dx+ {1 - g{Fx{x)))dx (16) 

-oo J 0 

for every real-valued random variable X (on some given atomless probability space) 
satisfying J'q (1—^(^^|x|(x))dx cxo, where I^x and F\x\ denote the distribution functions 
of X and |X|, respectively. The set Xg C of all such random variables forms a linear 
subspace of this follows from [HI Proposition 9.5] and (T] Proposition 4.75]. It is 
known that Pg is a law-invariant coherent risk measure; see, for instance, |19) . 

Lemma 3.1 Let pg ■. Xg ^ M. he the distortion risk measure associated with a convex 
distortion function g. Assume that there exist constants L,/3 > 0 such that 

1 — 9{t) < L{1 — tY for all t G [0,1]. (17) 

Then Pg satisfies conditions (c)-(d) of Assumption [2771 for this (3 and every A > 0 with 
X(3 > 1. 

Proof The hrst part of condition (c) is satished since pg is a law-invariant coherent risk 
measure. Condition ffT7|) and the convexity of the distortion function g together imply 
\g{t) — g{t') < L\t —1'\^ for all t,t' G [0,1]. In view of ffT6|l and the assumption X(3 > 1, 
it follows easily that condition (d) and the second part of condition (c) hold too. □ 
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If specifically g{t) = max{t — a; 0} for any fixed a G (0,1), then we have Xg = 
and pg is nothing bnt the Average Value at Risk AVaRa at level a. In this case, condition 
flTTH holds for (3 = 1. That is, AVaR^ satisfies conditions (c)-(d) of Assnmption 12 .1 1 for 
every A > 1. 


3.3. Further coherent risk measures 

Not every law-invariant coherent risk measnre p can be seen as a distortion risk measnre. 
In particnlar. Lemma 13.11 is not a general device to verify condition (d) of Assnmption 
12.11 If p is not a distortion risk measnre, then the following Lemma [3.21 might help. 


Lemma 3.2 Let p > 1. Let p : —)■ M 6e a law-invariant coherent risk measure and 
define a function Pp : [0,1] —)■ [0,1] by Ppit) := 1 — p{Bi_t), where Bi_t refers to any 
Bernoulli random variable with expectation 1 — t. Assume that there exist constants 
L, (3 > 0 such that 

1 — gp{t) < L{1 — tY for all t G [0,1]. (18) 

Then pg satisfies conditions (c)-(d) of Assumption [Ril for this (3 and every X> p with 
X(3 > 1. 

Proof The assnmption X > p ensnres c AA^L^), so that condition (c) is satished. 
Since p is dehned on L^, we can hnd a set Qp of continnons convex distortion fnnctions 
snch that Pp = inf^gg^ g and 

p{X) = snp Pg{X) for all A G L^. (19) 

g&Qp 

This follows from Proposition 5.1 and Remark 3.2 in [2] (adapted to onr dehnition of 
monotonicity and cash additivity); see also [inilI2]- Below we will show that flTS]) implies 

\g{t) — g{t')\ < L\t — t'\^ for all tfi' G [0,1] and g G Qp. (20) 


With the help of flT^ and (1201) we then obtain 


|7^p(m,^) - 7^p(A/■o,l)| = snp 7^p^(m„) - snp UpfiAfoY 

g&Gp 9&Sp 

< snp |7^p^ (m„) - TZp^ {AfoY \ 


g&Qp 


< snp 


d^Qp J —oo 


\9{FmM) - 9{FMo.iY)\dx 


< 




< C d^fimn,Afo,iY 
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for the constant C := L 1 /(l)x{xYdx (which is hnite due to the assumption A/3 > 1). 

That is, condition (d) is satisfied too. 

It remains to show fl20D . for which we will adapt the arguments of Section 4.3 in 
m- Let 0 < t < 1. Since the underlying probability space was assumed to be 
atomless, we may pick a measurable decomposition Ai U A 2 U A^ of the probability 
domain such that P[Ai] = 1 — t', P[^ 2 ] = t' — t and P[A 3 ] = t, where P refers to 
the corresponding probability measure. Dehne random variables := 1 ai, Bi^t '■= 

lyiiu ^2 Bt'-t '■= 1^2 5 and note that they are distributed according to the Bernoulli 
distribution with parameters 1 — 1 — t and t' — t, respectively. Moreover we clearly 

have Bi_t = Bi_ti + Bti_t. By the subadditivity of Pg we can conclude p{Bi_t) < 
p{Bi_t') + p{Bt:-t), and so 

= 1 - Pg{Bi_t>) - {1 - Pg{Bi-t)) 


< 

PgiBv 

’-t) 



< 

P{Bt'- 

-t) 



< 

sup 


-t) 

/3 


ne(0,l] 




< 

sup 

1 - 9,(1 - 

-u) 

(«' - tf 


ne(0,l] 

uP 



< 

sup 

1 - 9p{v) 

it’- 

-tf 


i;e[0,i) 

(1 — v)P 


for every g G Qp, where the second “<” is ensured by flT^ . By flTS]l the constant 
sup„g[o,i) is finite. Thus, since every g ^ Qp is also continuous at 1, condition 

flT^ indeed implies (12(111 . □ 

It is worth mentioning that if p is a distortion risk measure with distortion function 
g, then gp = g and condition fllSp boils down to condition f[T7|) . Here are two examples 
for law-invariant coherent risk measures on Orlicz hearts that are not distortion risk 
measures: 

Example 3.3 Given a G (0,1] and p G [1, cxo), the one-sided pth moment risk measure 
is the map p : —)■ R dehned by 

p{X) := E[Jf] + oE[((X - E[X])+)y/», 

where E refers to the expectation w.r.t. the probability measure of the basic probability 
space. The map p is clearly law-invariant and can easily be shown to be a coherent risk 
measure. But by Lemma A.5 in [10] it is not a distortion risk measure. 

The function gp dehned in Lemma is given by Pp(t) =t — at{l — ty^^, and thus 

1 — gp{t) = 1 —t at{l — ty^^ < (1 -|- a)(l — ty^^ for all t G [0,1]. 
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Therefore condition flT8|l is satisfied for (3 := 1/p. 


O 


Example 3.4 In pQ it has been pointed out that expectiles may be viewed as law- 
invariant coherent risk measures. The expectiles-based risk measure associated with 
a e [1/2,1) is the map p : —)■ M defined by 

p{X) := argmin^gR{a||(X-m) + ||2 + (l-a)||(m-X) + ||2}. 

It follows from Theorem 8 in |5] that p is not a distortion risk measure unless a = 1/2. 

The function Pp defined in Lemmais given by Ppit) = {l — a)t/{l — a + {l—t){2a — 
1)), and thus 

1 — gp{t) = a(l — t)/(l — a-l-(l — t){2a — 1))) < (a/(l — «))(! — t) for alH G [0,1]. 
Therefore condition fllSp is satisfied for /3 := 1. O 


4. Numerical examples 

In this section we present some numerical examples to illustrate the results of Section [2J 
Our results show that both the estimated normal approximation and the empirical plug¬ 
in estimator lead to reasonable estimators for the premium of an individual risk within 
a homogeneous insurance collective. Our results also show that these two estimators 
are asymptotically equivalent. Nevertheless for small to moderate collective sizes n the 
goodness of the estimators can vary from case to case. For example, in the case where p 
is the Value at Risk at level a the results of the Theorems 12 .2 1 and 12 .3 1 show that for both 
estimators the estimation error converges almost surely to zero at rate (nearly) 1/2 when 
E[|Yi|^] < oo for some A > 2 (where Yi refers to any p-distributed random variable). 
On the other hand, the latter condition does not exclude that = oo for some 

small £ > 0. In this case the total claim distribution can be essentially skewed to the 
right when the number of individual risks n is small to moderate; cf. Figure [TJ So one 
would expect that especially for heavy-tailed p and small to moderate n the estimators 
perform only moderately well. One would also expect that for heavy-tailed p (and even 
for medium-tailed p) and small to moderate n the empirical plug-in estimator should 
outperform the estimated normal approximation. Our goal in this section is to provide 
empirical evidence for our conjectures. 

To this end let us consider a sequence (Tj) of i.i.d. nonnegative random variables on 
a common probability space with distribution 

p = {l-p) 5o -hpPa.fe 
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for some p G (0,1), where Pa ,6 is the Pareto distribution with parameters a > 2 and 
b > 0. The Pareto distribution Pa,b is determined by the Lebesgue density 

fa,b{x) := ab~^{b~^x + l(o,oo)(a:), 


and the assumption a > 2 ensures that E[|Yi|^] < oo for all A G (2, a). We regard 
Yi,... ,Yn as a homogeneous insurance collective of size n, the number p as the prob¬ 
ability for the event of a strictly positive individual claim amount, and Pa,b as the 
individual claim distribution conditioned on this event. Note that in our example the 
mean m and the variance of p are given by 

pb 1 2 b‘^p^ 

m = - and s = - - -7 -- — --rrr. 

a —1 (a—l)(a — 2 ) (a — 1 )^ 



In the hrst part of this section, we estimate the total claim distribution p*”, i.e. the 
distribution of ^ly means of the empirical distribution based on a Monte-Carlo 

simulation. The plots in Figure [1] were derived from a simulation with 100.000 Monte- 
Carlo paths. We set p = 0.1 and chose the parameters a and b in such a way that the 
expected value of a single claim was normalised to 1. Each line shows the same set of 
parameters and each row shows the same collective size, starting with n = 100 on the 
left, n = 150 in the middle and n = 200 on the right. The hrst line shows the results for 
a = 2.1 and b = 11, the second line shows a = 3 and b = 20, the third line shows a = 6 
and 6 = 50 and the fourth line shows a = 10 and 6 = 90. In each plot the continuous 
line represents the estimator for /i*"' and the dashed line the probability density of the 
normal distribution Afnm,ns‘^ with m and determined through fl?T]) . We emphasize 
that /i*" has in fact point mass in zero. But the point mass is equal to (1 — p)"' and 
therefore extremely small. This is why the point mass of the empirical estimator is not 
visible in the plots. 

One can see that the empirical total claim distributions in the hrst line of Figured] are 
strongly skewed to the right even for larger collective sizes. The density of the normal 
distribution is very hat and has much mass on the negative semiaxis. The reason for 
this shape is the high variance s^, which increases rapidly as a gets closer to 2. In the 
case of a = 2.1 and 6=11 this rate is close to zero, saying that large collective sizes are 
needed to provide a suitable estimator. 

In the second line of Figured) for a = 3 and 6 = 20 the empirical total claim distribu¬ 
tions are still strongly skewed to the right. One can see that the normal approximation 
still does not resemble the empirical distribution. The deviation decreases visibly with 
increasing collective size due to the higher rate of convergence in the Berry-Esseen the¬ 
orem. Compared to the hrst line with a = 2.1 and 6 = 11 the quality of the normal 
approximation was increased in the second line with a = 3 and b = 20, which can 
be explained by the increasing rate of convergence in the Berry-Esseen theorem. For 
A G (2, 3] the convergence rate to the normal distribution is strictly increasing in A. For 
A > 3 the convergence rate can not be improved any more. 
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Figure 1: The continuous line shows the n-fold convolution /r*"' of /r = (1 — p)(5o + pPa,fe 
for p = 0.1 and the Pareto distribution Pa,b with parameter a = 2.1 in the 
hrst line, a = 3 in the second line, a = 6 in the third line and a = 10 in the 
fourth line and collective sizes n = 100 in the hrst row, n = 150 in the second 
row and n = 200 in the third row. The dashed line shows the density of the 
respective normal distribntion in each case. 
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In the third and fourth line of Figure [T] for a = 6 and 6 = 50 and a = 10 and 6 = 90 
the normal approximation provides a good approximation even for small collective sizes. 
The empirical total claim distributions are in both cases almost symmetric and the 
approximation leads to a good £t of both curves. The third moment of Xi exists in 
both cases and due to the Berry-Esseen theorem the deviation of from the normal 
distribution converges to zero with rate 1/2. We can see that there is no remarkable 
improval in the convergence rate once the existence of the third moment is guaranteed. 

In the second part of this section we compare the estimated normal approximation 
with the empirical plug-in estimator where the role of the risk measure p is played by 
the Value at risk at level a = 0.99. To save computing time we discretized the Pareto 
distribution Pa,fe on the equidistant grid lOMo = {0,10, 20,...}. The plots in Figure |2] 
were derived by a Monte-Carlo method using 100 Monte-Carlo paths in each simulation. 
Once again we chose p = 0.1. In order to compare the estimators we hrst calculated the 
exact Value at Risks at level 0.99 of /i*” (in fact we estimated it by means of a Monte- 
Carlo simulation based on 100.000 runs) in dependence on the collective size n. In each 
plot in Figure |2] the dotdashed line represents the relative Value at Risk 7lp{p*'^)/n, 
which we take as a reference to illustrate the biases of the estimators. The dashed 
line shows the estimated normal approximation 'R-p{N'nm,n,rLs^/^ for fhe Value at Risk 
relative to n. The continuous line shows the empirical plug-in estimator TZp{jl*^)/n for 
the Value at Risk relative to n. 

The hrst line shows the relative Value at Risks for the parameters a = 2.1 and 6 = 11 
on the left and a = 3 and 6 = 20 on the right hand side. In the second line we have a = 6 
and 6 = 50 on the left and a = 10 and 5 = 90 on the right hand side. Once again the 
parameters were chosen such that the expected value of a single claim was normalised 
to 1. 

For a = 2.1 we can see that both estimators show a large negative bias. The slow 
convergence in the Berry-Esseen theorem transfers directly to the convergence of the 
relative Value at risk of the distributions (recall that the Value at Risk fulhlls condition 
(d) of Assumption 12.11 for (3 = 1). Due to this slow convergence the collective size has 
to be chosen very large to provide a good estimation. What strikes the most is the large 
bias of the relative empirical plug-in estimator TZp{p*^)/n. The heaviness of the tails 
causes the empirical distribution pn to converge very slowly to p*^. We can see that in 
the case a = 3 the bias of both estimators decreases visibly. However in both cases the 
empirical plug-in estimator yields a better estimation. 

The plots for a = 6 and a = 10 resemble each other very much. In both cases the 
existence of the third moment of Xi is guaranteed, yielding the same rate of convergence 
in the Berry-Esseen theorem. We can see that for small n, e.g. n < 40, both estimators 
show a large bias. However for n < 100 the empirical plug-in estimator provides a better 
estimation. For n > 100 the estimated normal approximation could be preferred over 
the empirical plug-in estimator, because the biases of both estimators are more or less 
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Figure 2: Tlp{iJ*'^)/n (dotdashed line) as well as the average of 100 Monte-Carlo paths 
of respectively TZpi^fnrhn^n ^)/(dashed line) and Tlpijl*^)/n (continuous line) 
for p = VaRo.gg in dependence on the collective size n, showing a = 2.1 on the 
left hand side and a = 3 on the right hand side of the hrst line and a = 6 on 
the left hand side and a = 10 on the right hand side of the second line. 

the same and the estimated normal approximation consumes less computing time. 

As a conclusion one can say that the estimated normal approximation is not suitable 
for heavy-tailed (to medium-tailed) distributions whenever small collective sizes are at 
hand. In this case it is sensible to apply the empirical plug-in estimator, which consumes 
more computing time compared to the estimated normal approximation. 

5. Proofs 

The proof of Theorems 12.21 and 12.31 avails the following nonuniform Berry-Esseen in¬ 
equality ([22]). The inequality involves the nonuniform Kolmogorov distance which 
was introduced in flT^ . 
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Theorem 5.1 Let (Xj) be a sequence of i.i.d. random variables on some probability 
space (f2,X,P) such that Var[Xi] > 0 and E[|Xi|^] < oo for some A > 2. For every 
n eN, let 

2 . £".i(A'.-E|Xi|) 

■\/? 7 ,Var[Xi] 

Then there exists a universal constant C\ G (0, cxd) such that 


< Cxf(Pxi)n for all n E N 

with 7 := min{l, A — 2 }/ 2 , where 


/(PxJ 


max 


E[|Xi-E[Xi]A] 

Var[Xi]^/2 

E[|Xi-E[Xi]|3] . E[|Xi-E[Xi]|^] ) 
Var[Xi]3/2 ’ Var[Xi]^/2 / 


2 < A < 3 
A > 3 


( 22 ) 


(23) 


By “universal constant” we mean that the constant is independent of Pxi • Inequality 
fl2^ has been proven by Nagaev na and Bikelis |1] for A = 3 and A G (2, 3], respectively. 
Meanwhile there exist several estimates for the constant C\ for A G (2,3]; see [15] and 
references cited therein. For A > 3 the inequality is a direct consequence of Theorem 

5.15 in HZ]. 


5.1. 


Proof of Theorem 




(i): By part (c) of Assumption 12.11 and the representation (]1]) (and its analogue in the 
case of known parameters), we have 


p{-^nfhun, \/^('Sn„ 'S)l^p(A^^i) T'n('nr^^ ?7l). (24) 

Since the empirical standard deviation converges P-a.s. to the true standard deviation 
s, the claim of part (i) follows through dividing Equation fl24p by n. 

(ii): Let S'„ be a random variable with distribution /i*”, set Zn := (S'„ — nm)/{^/ns), 
and note that law{v^sZ„ + nm} = /r*”. Write for any random variable distributed 
according to the normal distribution N'nm,ns'^i and note that Z := (X„ — nm)/(y^s) is 
A/ 04 -distributed. Due to part (c) of Assumption 12.11 we obtain 

^p(A/'nm,n^2) - T^pih*'^) = piVnsZ + nm) - p{x/nsZn + nm) 

= Vns{p{Z) - p{Zn)) 

= \/ns(7^p(A/■o,l) - 7^p(mn)), (25) 


where trin denotes the law of The nonuniform Berry-Esseen inequality of The¬ 
orem |5T] shows that there exists a constant Kx G (0, cxd) such that d(f,^{J\fo^i,mn) < 
Kxn~^ for all n E N. Along with fl2^ and part (d) of Assumption 12.11 this ensures 
that we can End constants K,(d E (0, cxo) such that n~^\7lp{Afnm,ns'^) ~ '^p{p*^)\ ^ 
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n Tn„)^ < CKxn for all n G N. This completes the proof of part 

(ii). 

(iii) : The assertion follows from (i)-(ii). 

(iv) : By the Marcinkiewicz-Zygmund strong law of large numbers, we have that 

~ converges P-a.s. to zero for every r < 1/2. So the assertion follows from 
part (iii). 

(v) : The classical Central Limit Theorem says that the law of —m) converges 

weakly to A/o,s 2 - So the assertion follows from Slutzky’s lemma and part (iii). □ 


5.2. Proof of Theorem 12.3 

(i): Analogously to 


, we obtain 

~ 1 ■)) ~ {'^p{-^0,l) ~j ')) ( 26 ) 

for all a; G ff, where m„(a;; •) denotes the law of the random variable := (S')^(-) — 

nrhuS ^))!random variable S')/(-) with distribution and 

dehned on some probability space P^^). For notice that •) has mean 

mu^oj) and standard deviation s«„(a;) for every hxed uj. 

First let A > 3. By the nonuniform Berry-Esseen inequality of Theorem 15.11 we have 

j\x-^ y'jluSuj]dy)f'fiuS^]dx) 


c^<AA(-A/'o,i,mn(^^; •)) < C'Amax 


{f (x - f y^u„(u-dy)f'^ur.(u;dx)f^^’ 
fix - fyMu„ (^; dy) 1^ (^; dx) 

{f (x - f yJ2ur.(u;dy)f^u„(u;dx)}^^^ 


n 


-7 


( 27 ) 


for all n G N, where Cx G (0, cxd) is a universal constants depending only on A and 
being independent of n and u. As a consequence of part (a) of Assumption 12.21 we have 
that f \x\^Jiur^{^]dx) = (jj-converges to E[|Yi|^] for P-a.e. u. That is, the 


numerator of 


f\x-fy^uA^; dy) 1^ (w; dx) 


if {x - f y^^^{u}-,dy))^J2^^{uj;dx)y^^ ^ 

is bounded above by an expression that converges to 2^E[|Yi|'''] for P-a.e. u. The de¬ 
nominator is nothing but 'Su„{u)^ and thus converges to for P-a.e. u. That is, the 
expression in fl28l) converges to a positive constant for P-a.e. u. In the same way we 
obtain that 

I \x- f yJ^uA^; dy) {uj; dx) 

{f(x- Jy'fiu„{u; dy))‘^ dx)}^^^ 

converges to a positive constant for P-a.e. u. Together with part (d) of Assumption 
EH ([27D, and the P-a.s. convergence of to s, this implies 

„»(.,)) - ■)) = ( 29 ) 
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for P-a.e. oj. 

For 2 < A < 3 one can derive fl2^ in the same way, where on the right-hand side in 
the expression max{- • ■} has to be replaced by f jx — f y'jlu^iuj] dx)/{/ {x — 

J yJ^u„{^',dy))‘^yun{^'idx)}^^‘^. This completes the proof of part (i). 

(ii) : The assertion follows from (i)-(ii) of Theorem 12.21 and part (i) of Theorem 12.31 

(iii) -(iv): The assertions can be proven in the same way as the assertions (iv)-(v) of 

Theorem 12.21 just replace part (hi) of Theorem 12.21 bv part (ii) of Theorem 12.31 □ 


5.3. 


Proof of Remark 




Let p : —)■ M be a law-invariant coherent risk measure. First, Theorem 2.8 in m 

ensures that the corresponding risk functional TZp : —)■ M is continuous for the 

p-weak topology Op.^. The latter is dehned to the the coarsest topology on Ji4{LP) 
w.r.t. which each of the maps f dy, f G is continuous, where is the set of 

all continuous functions / : R —>■ M for which there exists a constant C > 0 such that 
|/(x)| < C{l + \x\^) for all x G R. According to Corollary A.45 in [7] the topological space 
(Ad(L^), Op_^) is Polish. Second, the topology Op.^ is generated by the L^-Wasserstein 
metric dwp and the mapping A4(U’) A4{Lp), y /i*"", is (dwp) c?Wp)-continuous; see 

Lemma 8.6 in [3]. Third, the mapping u i—)■ yu„{u), ■) is (J^, (j((Pp_w))-measurable. Indeed, 
it is easily seen that the Borel cx-algebra a(Op.w) on A4(L^) is generated by the maps 
y f fdy, f G C^. So, for (J^, cr((Pp_„))-measurabihty of the mapping hi —)■ Ai{LP), 
^ t /i«„(n;, •), it suffices to show 


^ f f{x) yun{' 5 dx'^ (A) G T for all A G B{R) and / G 




(30) 


Since yu„{oj,-) is a probability kernel from (f2,J^) to (R, i3(R)), the mapping uj i—)■ 
f f{x)yu„{oJ,dx) is (J^, i3(R))-measurable for every / G C^; see e.g. Lemma 1.41 in 
[S]. This gives (150]) . Altogether, we have shown that the mapping u 7lp(jl*^{oj, •)) is 
(J^, ;B(R))-measurable. □ 


A. On the computation of and 

In general the computation of the n-fold convolution p*” of is more or less impos¬ 
sible. However, in real applications the true y has support in hNo := {0, h, 2h ,...} for 
some fixed h > 0, where h represents the smallest monetary unit. We stress the fact 
that continuous distributions are in fact approximations for the equidistant discrete true 
single claim distribution, and not vice versa. So the empirical probability measure yu 
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is concentrated on the equidistant grid hNo, too. In this case the estimated total claim 
distribution /i*"' can be computed with the help of the recursive scheme 

K”|{0}l = A«l{0}l’‘ (31) 

= -=-jjJj|^((" + l)<-i)(‘«l{^A}l(*n{(i-OA}l fori€N, (32) 

provided /iu[{0}] > 0; see the discussion below. Note that as an empirical probability 
measure has bounded support. Therefore, the whole distribution can be computed 
by the scheme in hnitely many steps. In particular, the estimator 7lp(J2*^) 

can be computed in hnitely many steps even for tail-dependent functionals TZp as, for 
instance, the one associated with the Average Value at Risk (introduced at the end of 
Section IT^ . 

To justify the scheme note that the empirical probability probability measure 

Jiu dehned in (jS]) has the representation 


h„[-] = PuJ^u[-] + (1 -Pu) (5o[-], 


where Pu ■= /Wu[(0, cx))] is the mass of pu on (0, cx)), and ^u[-] := • n(0, cx))]/pu[(0, cx))] 

is the probability measure pu conditioned on (0,cx)). It is easily seen that the n-fold 
convolution p*"" coincides with the random convolution 


l^u 


n,Pu 


n 


fc =0 


of Du w.r.t. the binomial distribution with parameters n and Pu, i.e. 


P'U ~ 


(33) 


When Pu < 1 and Du has support in hN := {h,2h,...} for some h > 0, the random 
convolution Du can be computed with the help of the Panjer recursion [16]: 

P„*“-~K0}| = B„^„J{0}] (34) 

= -r^ + (35) 

Since 1 — Pu = ^^[{O}] and PuDu[{^h}] = PuH^h}] for £ G N = {1, 2 ,...}, the recursive 
scheme fl^ - fl52D follows from fl5^ - fl5^ . 
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